{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HTML解析入门及准备URL生成连续技\n",
    "![for humans](https://requests-html.kennethreitz.org/_static/requests-html-logo.png#thumbnail)\n",
    "\n",
    "*  本周主要内容：HTML解析（parse HTML）及准备URL生成连续技\n",
    "*  上周主要内容：HTML解析（parse HTML）及Xpath实践\n",
    "*  20春_Web数据挖掘_week03\n",
    "*  电子讲义设计者：廖汉腾, 许智超\n",
    "<br/>\n",
    "<br/>\n",
    "\n",
    "-----\n",
    "## 复习\n",
    "\n",
    "复习：上周内容，实践\n",
    "\n",
    "* HTML解析（parse HTML）: requests-html  丶\n",
    "* Xpath实践\n",
    "* m.liepin.com 取工作牛肉\n",
    "\n",
    "-----\n",
    "## 本周内容及学习目标\n",
    "\n",
    "本周内容聚焦在\n",
    "\n",
    "<mark> 如何从一页开始有系统的找更多页的内容 </mark>\n",
    "\n",
    "为此，我们需要学习\n",
    "\n",
    "1. 拆解带有参数的URL，并再从query取出参数\n",
    "   a. URL拆解: 使用 urllib.parse 解析 出query\n",
    "   b. query拆解:  取出参数 成python字典\n",
    "2. 有基底URL，加上参数字典，请求新网页连续技\n",
    "\n",
    "我们除了继续学习解决上一周已开始面对的以下挑战：\n",
    "![Xpath Axis](http://krum.rz.uni-mannheim.de/inet-2005/images/xpath-axis.gif)\n",
    "\n",
    "### 旧目标\n",
    "1. 使用 requests-html 爬取并存取网页文字档，查找[requests-html 中文文档](https://cncert.github.io/requests-html-doc-cn/#/)\n",
    "2. 熟悉 [xpath 语法](https://www.w3cschool.cn/xpath/xpath-syntax.html)丶[xpath 节点](https://www.w3cschool.cn/xpath/xpath-nodes.html)\n",
    "3. 使用 [xpath cheatsheet](https://devhints.io/xpath)\n",
    "  * 在 Chrome Inspector 使用\n",
    "  * 在 requests-html (Python) 使用\n",
    "4. 简易使用 [pd.DataFrame](https://www.pypandas.cn/doc/getting_started/dsintro.html#dataframe)\n",
    "\n",
    "### 新目标\n",
    "这一周，学生将实践\n",
    "* 猎聘PC版 liepin.com 取工作URL参数的牛肉\n",
    "* 如何生成一连串新URL以进一步爬取数据\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "/* 本电子讲义使用之CSS */\n",
       "div.code_cell {\n",
       "    background-color: #e5f1fe;\n",
       "}\n",
       "div.cell.selected {\n",
       "    background-color: #effee2;\n",
       "    font-size: 2rem;\n",
       "    line-height: 2.4rem;\n",
       "}\n",
       "div.cell.selected .rendered_html table {\n",
       "    font-size: 2rem !important;\n",
       "    line-height: 2.4rem !important;\n",
       "}\n",
       ".rendered_html pre code {\n",
       "    background-color: #C4E4ff;   \n",
       "    padding: 2px 25px;\n",
       "}\n",
       ".rendered_html pre {\n",
       "    background-color: #99c9ff;\n",
       "}\n",
       "div.code_cell .CodeMirror {\n",
       "    font-size: 2rem !important;\n",
       "    line-height: 2.4rem !important;\n",
       "}\n",
       ".rendered_html img, .rendered_html svg {\n",
       "    max-width: 60%;\n",
       "    height: auto;\n",
       "    float: right;\n",
       "}\n",
       "\n",
       ".rendered_html img[src*=\"#full\"], .rendered_html svg[src*=\"#full\"] {\n",
       "    max-width: 100%;\n",
       "    height: auto;\n",
       "    float: none;\n",
       "}\n",
       "\n",
       ".rendered_html img[src*=\"#thumbnail\"], .rendered_html svg[src*=\"#thumbnail\"] {\n",
       "    max-width: 15%;\n",
       "    height: auto;\n",
       "}\n",
       "\n",
       "/* Gradient transparent - color - transparent */\n",
       "hr {\n",
       "    border: 0;\n",
       "    border-bottom: 1px dashed #ccc;\n",
       "}\n",
       ".emoticon{\n",
       "    font-size: 5rem;\n",
       "    line-height: 4.4rem;\n",
       "    text-align: center;\n",
       "    vertical-align: middle;\n",
       "}\n",
       ".bg-split_apply_comine {\n",
       "    width: 500px;     \n",
       "    height: 300px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -10px -10px;\n",
       "    float: right;\n",
       "}\n",
       ".bg-comine {\n",
       "    width: 175px;\n",
       "    height: 150px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -280px -80px;\n",
       "    float: right;\n",
       "}\n",
       ".bg-apply {\n",
       "    width: 155px;\n",
       "    height: 225px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -160px -30px;\n",
       "    float: right;\n",
       "}\n",
       ".bg-split {\n",
       "    width: 205px;\n",
       "    height: 225px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -10px -30px;\n",
       "    float: right;\n",
       "}\n",
       ".break {\n",
       "                   page-break-after: right; \n",
       "                   width:700px;\n",
       "                   clear:both;\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<style>\n",
    "/* 本电子讲义使用之CSS */\n",
    "div.code_cell {\n",
    "    background-color: #e5f1fe;\n",
    "}\n",
    "div.cell.selected {\n",
    "    background-color: #effee2;\n",
    "    font-size: 2rem;\n",
    "    line-height: 2.4rem;\n",
    "}\n",
    "div.cell.selected .rendered_html table {\n",
    "    font-size: 2rem !important;\n",
    "    line-height: 2.4rem !important;\n",
    "}\n",
    ".rendered_html pre code {\n",
    "    background-color: #C4E4ff;   \n",
    "    padding: 2px 25px;\n",
    "}\n",
    ".rendered_html pre {\n",
    "    background-color: #99c9ff;\n",
    "}\n",
    "div.code_cell .CodeMirror {\n",
    "    font-size: 2rem !important;\n",
    "    line-height: 2.4rem !important;\n",
    "}\n",
    ".rendered_html img, .rendered_html svg {\n",
    "    max-width: 60%;\n",
    "    height: auto;\n",
    "    float: right;\n",
    "}\n",
    "\n",
    ".rendered_html img[src*=\"#full\"], .rendered_html svg[src*=\"#full\"] {\n",
    "    max-width: 100%;\n",
    "    height: auto;\n",
    "    float: none;\n",
    "}\n",
    "\n",
    ".rendered_html img[src*=\"#thumbnail\"], .rendered_html svg[src*=\"#thumbnail\"] {\n",
    "    max-width: 15%;\n",
    "    height: auto;\n",
    "}\n",
    "\n",
    "/* Gradient transparent - color - transparent */\n",
    "hr {\n",
    "    border: 0;\n",
    "    border-bottom: 1px dashed #ccc;\n",
    "}\n",
    ".emoticon{\n",
    "    font-size: 5rem;\n",
    "    line-height: 4.4rem;\n",
    "    text-align: center;\n",
    "    vertical-align: middle;\n",
    "}\n",
    ".bg-split_apply_comine {\n",
    "    width: 500px;     \n",
    "    height: 300px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -10px -10px;\n",
    "    float: right;\n",
    "}\n",
    ".bg-comine {\n",
    "    width: 175px;\n",
    "    height: 150px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -280px -80px;\n",
    "    float: right;\n",
    "}\n",
    ".bg-apply {\n",
    "    width: 155px;\n",
    "    height: 225px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -160px -30px;\n",
    "    float: right;\n",
    "}\n",
    ".bg-split {\n",
    "    width: 205px;\n",
    "    height: 225px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -10px -30px;\n",
    "    float: right;\n",
    "}\n",
    ".break {\n",
    "                   page-break-after: right; \n",
    "                   width:700px;\n",
    "                   clear:both;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 基本模块\n",
    "import pandas as pd\n",
    "from requests_html import HTMLSession"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 0. 上周加分作业解答"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[60, 60, 60, 60, 60, 60, 60, 60]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>职称</th>\n",
       "      <th>薪水</th>\n",
       "      <th>公司地点</th>\n",
       "      <th>公司名称</th>\n",
       "      <th>时间</th>\n",
       "      <th>经验</th>\n",
       "      <th>链结</th>\n",
       "      <th>公司URL</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>旅游产品经理</td>\n",
       "      <td>12-20k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>前海爱讯科技(深圳)有限公司</td>\n",
       "      <td>16小时前</td>\n",
       "      <td>2年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1926703515.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8972310/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>教育科技 软件产品经理</td>\n",
       "      <td>12-18k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>融捷投资控股集团</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1922705123.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8025674/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>12-18k·12薪</td>\n",
       "      <td>广州-海珠区</td>\n",
       "      <td>广州大白互联网科技有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1922402715.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8695948/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>实施经理</td>\n",
       "      <td>16-23k·12薪</td>\n",
       "      <td>广州-大沙</td>\n",
       "      <td>广东卓志供应链服务集团有限公司</td>\n",
       "      <td>2020-03-23</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1924985573.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9238204/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>互联网产品经理</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "      <td>广州-琶洲</td>\n",
       "      <td>广东车海洋环保科技有限公司</td>\n",
       "      <td>2020-03-20</td>\n",
       "      <td>3年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1917453193.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9256869/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>后台产品经理</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>广东南方新媒体股份有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925126353.shtml</td>\n",
       "      <td>https://m.liepin.com/company/7889168/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>区块链产品经理</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州-黄埔区</td>\n",
       "      <td>北京普瑞未来教育科技集团有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1919835727.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9989029/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>高级产品经理</td>\n",
       "      <td>20-25k·13薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>某软件开发企业</td>\n",
       "      <td>昨天</td>\n",
       "      <td>3年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/a/18948933.shtml</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>产品经理（电商系统）</td>\n",
       "      <td>25-40k·14薪</td>\n",
       "      <td>广东,深圳,广州</td>\n",
       "      <td>知名跨境电商公司</td>\n",
       "      <td>昨天</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/a/18705133.shtml</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>WMS产品经理</td>\n",
       "      <td>20-35k·14薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>某知名跨境电商平台</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>2年以上 学历不限</td>\n",
       "      <td>https://m.liepin.com/a/18963147.shtml</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>产品经理（支付/后端）</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>广州-海珠区</td>\n",
       "      <td>北京路客互联网科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1917750895.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9284656/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>产品总监</td>\n",
       "      <td>50-70k·13薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>名创优品</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>8年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1925389277.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8392675/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>产品专员</td>\n",
       "      <td>5-8k·12薪</td>\n",
       "      <td>广州-海珠区</td>\n",
       "      <td>广州三易互联网科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>经验不限 学历不限</td>\n",
       "      <td>https://m.liepin.com/job/1922364281.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9647941/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>产品助理</td>\n",
       "      <td>5-8k·13薪</td>\n",
       "      <td>广州-海珠区</td>\n",
       "      <td>广州三易互联网科技有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>经验不限 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1922356557.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9647941/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>广州易达建信科技开发有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>1年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1919464529.shtml</td>\n",
       "      <td>https://m.liepin.com/company/5493174/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>14-22k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>锦江信息技术(广州)有限公司</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1919024715.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8973053/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>供应链产品经理</td>\n",
       "      <td>10-23k·12薪</td>\n",
       "      <td>广州-黄埔区</td>\n",
       "      <td>健客网</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1914662183.shtml</td>\n",
       "      <td>https://m.liepin.com/company/582047/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>资深产品经理（相机产品）</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州</td>\n",
       "      <td>网易集团</td>\n",
       "      <td>32分钟前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926534703.shtml</td>\n",
       "      <td>https://m.liepin.com/company/5964833/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>产品经理（OA）</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>佛山市艾臣家居科技有限公司</td>\n",
       "      <td>21分钟前</td>\n",
       "      <td>3年以上 学历不限</td>\n",
       "      <td>https://m.liepin.com/job/1919955237.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9220328/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>商品经理/产品经理</td>\n",
       "      <td>12-18k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>广州苑同电子商务有限公司</td>\n",
       "      <td>14小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1927082955.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9379317/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州-海珠区</td>\n",
       "      <td>青木数字技术股份有限公司</td>\n",
       "      <td>14小时前</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1927082439.shtml</td>\n",
       "      <td>https://m.liepin.com/company/12191983/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>商品经理</td>\n",
       "      <td>12-18k·12薪</td>\n",
       "      <td>广州-珠江新城</td>\n",
       "      <td>广州苑同电子商务有限公司</td>\n",
       "      <td>14小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1927082223.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9379317/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>产品经理（校园招聘）</td>\n",
       "      <td>8-12k·13薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>16小时前</td>\n",
       "      <td>经验不限 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1927075137.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>项目经理</td>\n",
       "      <td>12-16k·12薪</td>\n",
       "      <td>广州-番禺区</td>\n",
       "      <td>广州鲸睿科技有限公司</td>\n",
       "      <td>21小时前</td>\n",
       "      <td>5年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1927062195.shtml</td>\n",
       "      <td>https://m.liepin.com/company/12190211/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>WXG03-微信公众号小程序生活服务行业产品经理（广州）</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州</td>\n",
       "      <td>腾讯</td>\n",
       "      <td>15小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1927010729.shtml</td>\n",
       "      <td>https://m.liepin.com/company/7983148/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>大数据产品经理（高级）</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>广州-元岗</td>\n",
       "      <td>广州丰石科技有限公司</td>\n",
       "      <td>20小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1927002211.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8970680/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>数据技术架构师（南京/广州/西安）</td>\n",
       "      <td>18-30k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>南京海翊数据技术有限公司</td>\n",
       "      <td>17小时前</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1926956769.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9724077/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>高级业务架构师（数字化新零售）</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州-海珠区</td>\n",
       "      <td>广州滴普科技有限公司</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>8年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926800719.shtml</td>\n",
       "      <td>https://m.liepin.com/company/10166945/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>产品经理（智能终端产品）</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1926797053.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>产品经理（营收）</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>经验不限 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926712533.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>融捷健康智能电子公司-智能硬件产品经理</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州</td>\n",
       "      <td>融捷投资控股集团</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1926705941.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8025674/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>产品经理-内容优化方向</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>经验不限 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926699881.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>平台SDK产品经理</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>经验不限 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926699879.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>账号产品经理</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926647497.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>商家产品经理</td>\n",
       "      <td>25-50k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>Fordeal</td>\n",
       "      <td>12小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926621665.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9644389/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>20-40k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>卓尔人人</td>\n",
       "      <td>15小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1926419233.shtml</td>\n",
       "      <td>https://m.liepin.com/company/12146335/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>产品经理（用户体验改善）</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>嘟比英语</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1926412121.shtml</td>\n",
       "      <td>https://m.liepin.com/company/12166375/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>15-20k·13薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>广州诚迈信息科技有限公司</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1926106673.shtml</td>\n",
       "      <td>https://m.liepin.com/company/10063493/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>高级产品经理</td>\n",
       "      <td>25-35k·15薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925922019.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>产品专员</td>\n",
       "      <td>6-10k·15薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925921709.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>高级产品经理(J10274)</td>\n",
       "      <td>14-20k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>广州金鹏集团有限公司</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>3年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925674943.shtml</td>\n",
       "      <td>https://m.liepin.com/company/7999640/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>上海源慧信息科技股份有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>4年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925668381.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9577680/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>8-13k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>深圳合众财富金融投资管理有限公司</td>\n",
       "      <td>18小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925564195.shtml</td>\n",
       "      <td>https://m.liepin.com/company/8634255/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>ATS需求分析师</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1925556345.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>产品经理（临床科研）</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>健康互联(广州)信息科技股份有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1925540179.shtml</td>\n",
       "      <td>https://m.liepin.com/company/10087541/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>产品经理-供应链金融</td>\n",
       "      <td>15-30k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>TCL金融</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1925519307.shtml</td>\n",
       "      <td>https://m.liepin.com/company/7876336/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>直播产品经理</td>\n",
       "      <td>15-25k·15薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1924987385.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>产品经理（全自动运行方向）</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1924819589.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>产品经理（节能控制方向）</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1924819521.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>需求分析师（会计运营方向）</td>\n",
       "      <td>12-18k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>北京公瑾科技有限公司广州分公司</td>\n",
       "      <td>21小时前</td>\n",
       "      <td>2年以上 大专及以上</td>\n",
       "      <td>https://m.liepin.com/job/1924806009.shtml</td>\n",
       "      <td>https://m.liepin.com/company/10095399/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>产品经理（智能运维方向）</td>\n",
       "      <td>10-20k·14薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1924549497.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>产品经理（数字孪生方向）</td>\n",
       "      <td>10-20k·14薪</td>\n",
       "      <td>广州</td>\n",
       "      <td>佳都新太科技</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1924549389.shtml</td>\n",
       "      <td>https://m.liepin.com/company/2115085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>8-10k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>广东高乐教育科技有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1924467355.shtml</td>\n",
       "      <td>https://m.liepin.com/company/10156263/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>15-25k·13薪</td>\n",
       "      <td>广州-五山</td>\n",
       "      <td>广东倍智人才网络科技有限公司</td>\n",
       "      <td>12小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1924391239.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9429345/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>直播营收产品运营经理</td>\n",
       "      <td>20-30k·15薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>上海翡翠东方网络信息技术有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1924139323.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9947855/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>产品经理</td>\n",
       "      <td>面议</td>\n",
       "      <td>广州-黄花岗</td>\n",
       "      <td>广州奥凯信息咨询有限公司</td>\n",
       "      <td>23小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "      <td>https://m.liepin.com/job/1923601039.shtml</td>\n",
       "      <td>https://m.liepin.com/company/1547151/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>产品经理/总监</td>\n",
       "      <td>30-40k·12薪</td>\n",
       "      <td>广州-黄埔区</td>\n",
       "      <td>金财互联数据服务有限公司</td>\n",
       "      <td>12小时前</td>\n",
       "      <td>8年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1923570067.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9292058/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>高级产品经理</td>\n",
       "      <td>18-25k·13薪</td>\n",
       "      <td>广州-番禺区</td>\n",
       "      <td>广州优盟广告策划有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1923227909.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9287730/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>供应链高级产品经理</td>\n",
       "      <td>20-30k·13薪</td>\n",
       "      <td>广州-天河北</td>\n",
       "      <td>广州市安发网络科技有限公司</td>\n",
       "      <td>16小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1922691351.shtml</td>\n",
       "      <td>https://m.liepin.com/company/9261841/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>产品策划经理</td>\n",
       "      <td>8-12k·12薪</td>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>广东高乐教育科技有限公司</td>\n",
       "      <td>22小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "      <td>https://m.liepin.com/job/1922519235.shtml</td>\n",
       "      <td>https://m.liepin.com/company/10156263/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               职称          薪水      公司地点                公司名称  \\\n",
       "0                         旅游产品经理   12-20k·12薪        广州      前海爱讯科技(深圳)有限公司   \n",
       "1                    教育科技 软件产品经理   12-18k·12薪        广州            融捷投资控股集团   \n",
       "2                           产品经理   12-18k·12薪    广州-海珠区       广州大白互联网科技有限公司   \n",
       "3                           实施经理   16-23k·12薪     广州-大沙     广东卓志供应链服务集团有限公司   \n",
       "4                        互联网产品经理   10-15k·12薪     广州-琶洲       广东车海洋环保科技有限公司   \n",
       "5                         后台产品经理   10-20k·12薪        广州       广东南方新媒体股份有限公司   \n",
       "6                        区块链产品经理   15-25k·12薪    广州-黄埔区    北京普瑞未来教育科技集团有限公司   \n",
       "7                         高级产品经理   20-25k·13薪        广州             某软件开发企业   \n",
       "8                     产品经理（电商系统）   25-40k·14薪  广东,深圳,广州            知名跨境电商公司   \n",
       "9                        WMS产品经理   20-35k·14薪        广州           某知名跨境电商平台   \n",
       "10                   产品经理（支付/后端）   10-20k·12薪    广州-海珠区       北京路客互联网科技有限公司   \n",
       "11                          产品总监   50-70k·13薪        广州                名创优品   \n",
       "12                          产品专员     5-8k·12薪    广州-海珠区       广州三易互联网科技有限公司   \n",
       "13                          产品助理     5-8k·13薪    广州-海珠区       广州三易互联网科技有限公司   \n",
       "14                          产品经理   10-20k·12薪    广州-天河区      广州易达建信科技开发有限公司   \n",
       "15                          产品经理   14-22k·12薪        广州      锦江信息技术(广州)有限公司   \n",
       "16                       供应链产品经理   10-23k·12薪    广州-黄埔区                 健客网   \n",
       "17                  资深产品经理（相机产品）           面议        广州                网易集团   \n",
       "18                      产品经理（OA）   10-15k·12薪    广州-天河区       佛山市艾臣家居科技有限公司   \n",
       "19                     商品经理/产品经理   12-18k·12薪        广州        广州苑同电子商务有限公司   \n",
       "20                          产品经理   15-25k·12薪    广州-海珠区        青木数字技术股份有限公司   \n",
       "21                          商品经理   12-18k·12薪   广州-珠江新城        广州苑同电子商务有限公司   \n",
       "22                    产品经理（校园招聘）    8-12k·13薪        广州              佳都新太科技   \n",
       "23                          项目经理   12-16k·12薪    广州-番禺区          广州鲸睿科技有限公司   \n",
       "24  WXG03-微信公众号小程序生活服务行业产品经理（广州）           面议        广州                  腾讯   \n",
       "25                   大数据产品经理（高级）   10-20k·12薪     广州-元岗          广州丰石科技有限公司   \n",
       "26             数据技术架构师（南京/广州/西安）   18-30k·12薪        广州        南京海翊数据技术有限公司   \n",
       "27               高级业务架构师（数字化新零售）           面议    广州-海珠区          广州滴普科技有限公司   \n",
       "28                  产品经理（智能终端产品）           面议        广州              佳都新太科技   \n",
       "29                      产品经理（营收）   20-30k·12薪        广州    上海翡翠东方网络信息技术有限公司   \n",
       "30           融捷健康智能电子公司-智能硬件产品经理           面议        广州            融捷投资控股集团   \n",
       "31                   产品经理-内容优化方向   15-25k·12薪        广州    上海翡翠东方网络信息技术有限公司   \n",
       "32                     平台SDK产品经理   15-25k·12薪        广州    上海翡翠东方网络信息技术有限公司   \n",
       "33                        账号产品经理   15-25k·12薪        广州    上海翡翠东方网络信息技术有限公司   \n",
       "34                        商家产品经理   25-50k·12薪        广州             Fordeal   \n",
       "35                          产品经理   20-40k·12薪        广州                卓尔人人   \n",
       "36                  产品经理（用户体验改善）   15-25k·12薪    广州-天河区                嘟比英语   \n",
       "37                          产品经理   15-20k·13薪        广州        广州诚迈信息科技有限公司   \n",
       "38                        高级产品经理   25-35k·15薪    广州-天河区    上海翡翠东方网络信息技术有限公司   \n",
       "39                          产品专员    6-10k·15薪    广州-天河区    上海翡翠东方网络信息技术有限公司   \n",
       "40                高级产品经理(J10274)   14-20k·12薪        广州          广州金鹏集团有限公司   \n",
       "41                          产品经理           面议    广州-天河区      上海源慧信息科技股份有限公司   \n",
       "42                          产品经理    8-13k·12薪        广州    深圳合众财富金融投资管理有限公司   \n",
       "43                      ATS需求分析师           面议        广州              佳都新太科技   \n",
       "44                    产品经理（临床科研）   15-25k·12薪        广州  健康互联(广州)信息科技股份有限公司   \n",
       "45                    产品经理-供应链金融   15-30k·12薪    广州-天河区               TCL金融   \n",
       "46                        直播产品经理   15-25k·15薪    广州-天河区    上海翡翠东方网络信息技术有限公司   \n",
       "47                 产品经理（全自动运行方向）   10-20k·12薪        广州              佳都新太科技   \n",
       "48                  产品经理（节能控制方向）   10-20k·12薪        广州              佳都新太科技   \n",
       "49                 需求分析师（会计运营方向）   12-18k·12薪    广州-天河区     北京公瑾科技有限公司广州分公司   \n",
       "50                  产品经理（智能运维方向）   10-20k·14薪        广州              佳都新太科技   \n",
       "51                  产品经理（数字孪生方向）   10-20k·14薪        广州              佳都新太科技   \n",
       "52                          产品经理    8-10k·12薪    广州-天河区        广东高乐教育科技有限公司   \n",
       "53                          产品经理   15-25k·13薪     广州-五山      广东倍智人才网络科技有限公司   \n",
       "54                    直播营收产品运营经理   20-30k·15薪    广州-天河区    上海翡翠东方网络信息技术有限公司   \n",
       "55                          产品经理           面议    广州-黄花岗        广州奥凯信息咨询有限公司   \n",
       "56                       产品经理/总监   30-40k·12薪    广州-黄埔区        金财互联数据服务有限公司   \n",
       "57                        高级产品经理   18-25k·13薪    广州-番禺区        广州优盟广告策划有限公司   \n",
       "58                     供应链高级产品经理   20-30k·13薪    广州-天河北       广州市安发网络科技有限公司   \n",
       "59                        产品策划经理    8-12k·12薪    广州-天河区        广东高乐教育科技有限公司   \n",
       "\n",
       "            时间          经验                                         链结  \\\n",
       "0        16小时前   2年以上 统招本科  https://m.liepin.com/job/1926703515.shtml   \n",
       "1        23小时前   3年以上 统招本科  https://m.liepin.com/job/1922705123.shtml   \n",
       "2        22小时前  2年以上 本科及以上  https://m.liepin.com/job/1922402715.shtml   \n",
       "3   2020-03-23   5年以上 统招本科  https://m.liepin.com/job/1924985573.shtml   \n",
       "4   2020-03-20  3年以上 大专及以上  https://m.liepin.com/job/1917453193.shtml   \n",
       "5         一个月前  3年以上 本科及以上  https://m.liepin.com/job/1925126353.shtml   \n",
       "6         一个月前  3年以上 大专及以上  https://m.liepin.com/job/1919835727.shtml   \n",
       "7           昨天  3年以上 大专及以上      https://m.liepin.com/a/18948933.shtml   \n",
       "8           昨天   3年以上 统招本科      https://m.liepin.com/a/18705133.shtml   \n",
       "9         一个月前   2年以上 学历不限      https://m.liepin.com/a/18963147.shtml   \n",
       "10        一个月前  3年以上 本科及以上  https://m.liepin.com/job/1917750895.shtml   \n",
       "11        一个月前   8年以上 统招本科  https://m.liepin.com/job/1925389277.shtml   \n",
       "12        一个月前   经验不限 学历不限  https://m.liepin.com/job/1922364281.shtml   \n",
       "13        一个月前  经验不限 本科及以上  https://m.liepin.com/job/1922356557.shtml   \n",
       "14        一个月前  1年以上 大专及以上  https://m.liepin.com/job/1919464529.shtml   \n",
       "15        一个月前  5年以上 本科及以上  https://m.liepin.com/job/1919024715.shtml   \n",
       "16        一个月前  2年以上 本科及以上  https://m.liepin.com/job/1914662183.shtml   \n",
       "17       32分钟前  5年以上 本科及以上  https://m.liepin.com/job/1926534703.shtml   \n",
       "18       21分钟前   3年以上 学历不限  https://m.liepin.com/job/1919955237.shtml   \n",
       "19       14小时前  3年以上 本科及以上  https://m.liepin.com/job/1927082955.shtml   \n",
       "20       14小时前   5年以上 统招本科  https://m.liepin.com/job/1927082439.shtml   \n",
       "21       14小时前  3年以上 本科及以上  https://m.liepin.com/job/1927082223.shtml   \n",
       "22       16小时前   经验不限 统招本科  https://m.liepin.com/job/1927075137.shtml   \n",
       "23       21小时前  5年以上 大专及以上  https://m.liepin.com/job/1927062195.shtml   \n",
       "24       15小时前  2年以上 本科及以上  https://m.liepin.com/job/1927010729.shtml   \n",
       "25       20小时前  5年以上 本科及以上  https://m.liepin.com/job/1927002211.shtml   \n",
       "26       17小时前   5年以上 统招本科  https://m.liepin.com/job/1926956769.shtml   \n",
       "27       23小时前  8年以上 本科及以上  https://m.liepin.com/job/1926800719.shtml   \n",
       "28       22小时前   3年以上 统招本科  https://m.liepin.com/job/1926797053.shtml   \n",
       "29       22小时前  经验不限 本科及以上  https://m.liepin.com/job/1926712533.shtml   \n",
       "30       23小时前   5年以上 统招本科  https://m.liepin.com/job/1926705941.shtml   \n",
       "31       22小时前  经验不限 本科及以上  https://m.liepin.com/job/1926699881.shtml   \n",
       "32       22小时前  经验不限 本科及以上  https://m.liepin.com/job/1926699879.shtml   \n",
       "33       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1926647497.shtml   \n",
       "34       12小时前  3年以上 本科及以上  https://m.liepin.com/job/1926621665.shtml   \n",
       "35       15小时前   3年以上 统招本科  https://m.liepin.com/job/1926419233.shtml   \n",
       "36       22小时前   3年以上 统招本科  https://m.liepin.com/job/1926412121.shtml   \n",
       "37       23小时前  3年以上 本科及以上  https://m.liepin.com/job/1926106673.shtml   \n",
       "38       22小时前  5年以上 本科及以上  https://m.liepin.com/job/1925922019.shtml   \n",
       "39       22小时前  1年以上 本科及以上  https://m.liepin.com/job/1925921709.shtml   \n",
       "40       23小时前  3年以上 大专及以上  https://m.liepin.com/job/1925674943.shtml   \n",
       "41       22小时前  4年以上 大专及以上  https://m.liepin.com/job/1925668381.shtml   \n",
       "42       18小时前  2年以上 本科及以上  https://m.liepin.com/job/1925564195.shtml   \n",
       "43       22小时前   3年以上 统招本科  https://m.liepin.com/job/1925556345.shtml   \n",
       "44       22小时前  2年以上 本科及以上  https://m.liepin.com/job/1925540179.shtml   \n",
       "45       23小时前   3年以上 统招本科  https://m.liepin.com/job/1925519307.shtml   \n",
       "46       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1924987385.shtml   \n",
       "47       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1924819589.shtml   \n",
       "48       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1924819521.shtml   \n",
       "49       21小时前  2年以上 大专及以上  https://m.liepin.com/job/1924806009.shtml   \n",
       "50       22小时前   5年以上 统招本科  https://m.liepin.com/job/1924549497.shtml   \n",
       "51       22小时前   5年以上 统招本科  https://m.liepin.com/job/1924549389.shtml   \n",
       "52       22小时前  2年以上 本科及以上  https://m.liepin.com/job/1924467355.shtml   \n",
       "53       12小时前   3年以上 统招本科  https://m.liepin.com/job/1924391239.shtml   \n",
       "54       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1924139323.shtml   \n",
       "55       23小时前   3年以上 统招本科  https://m.liepin.com/job/1923601039.shtml   \n",
       "56       12小时前  8年以上 本科及以上  https://m.liepin.com/job/1923570067.shtml   \n",
       "57       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1923227909.shtml   \n",
       "58       16小时前  5年以上 本科及以上  https://m.liepin.com/job/1922691351.shtml   \n",
       "59       22小时前  3年以上 本科及以上  https://m.liepin.com/job/1922519235.shtml   \n",
       "\n",
       "                                     公司URL  \n",
       "0    https://m.liepin.com/company/8972310/  \n",
       "1    https://m.liepin.com/company/8025674/  \n",
       "2    https://m.liepin.com/company/8695948/  \n",
       "3    https://m.liepin.com/company/9238204/  \n",
       "4    https://m.liepin.com/company/9256869/  \n",
       "5    https://m.liepin.com/company/7889168/  \n",
       "6    https://m.liepin.com/company/9989029/  \n",
       "7                                           \n",
       "8                                           \n",
       "9                                           \n",
       "10   https://m.liepin.com/company/9284656/  \n",
       "11   https://m.liepin.com/company/8392675/  \n",
       "12   https://m.liepin.com/company/9647941/  \n",
       "13   https://m.liepin.com/company/9647941/  \n",
       "14   https://m.liepin.com/company/5493174/  \n",
       "15   https://m.liepin.com/company/8973053/  \n",
       "16    https://m.liepin.com/company/582047/  \n",
       "17   https://m.liepin.com/company/5964833/  \n",
       "18   https://m.liepin.com/company/9220328/  \n",
       "19   https://m.liepin.com/company/9379317/  \n",
       "20  https://m.liepin.com/company/12191983/  \n",
       "21   https://m.liepin.com/company/9379317/  \n",
       "22   https://m.liepin.com/company/2115085/  \n",
       "23  https://m.liepin.com/company/12190211/  \n",
       "24   https://m.liepin.com/company/7983148/  \n",
       "25   https://m.liepin.com/company/8970680/  \n",
       "26   https://m.liepin.com/company/9724077/  \n",
       "27  https://m.liepin.com/company/10166945/  \n",
       "28   https://m.liepin.com/company/2115085/  \n",
       "29   https://m.liepin.com/company/9947855/  \n",
       "30   https://m.liepin.com/company/8025674/  \n",
       "31   https://m.liepin.com/company/9947855/  \n",
       "32   https://m.liepin.com/company/9947855/  \n",
       "33   https://m.liepin.com/company/9947855/  \n",
       "34   https://m.liepin.com/company/9644389/  \n",
       "35  https://m.liepin.com/company/12146335/  \n",
       "36  https://m.liepin.com/company/12166375/  \n",
       "37  https://m.liepin.com/company/10063493/  \n",
       "38   https://m.liepin.com/company/9947855/  \n",
       "39   https://m.liepin.com/company/9947855/  \n",
       "40   https://m.liepin.com/company/7999640/  \n",
       "41   https://m.liepin.com/company/9577680/  \n",
       "42   https://m.liepin.com/company/8634255/  \n",
       "43   https://m.liepin.com/company/2115085/  \n",
       "44  https://m.liepin.com/company/10087541/  \n",
       "45   https://m.liepin.com/company/7876336/  \n",
       "46   https://m.liepin.com/company/9947855/  \n",
       "47   https://m.liepin.com/company/2115085/  \n",
       "48   https://m.liepin.com/company/2115085/  \n",
       "49  https://m.liepin.com/company/10095399/  \n",
       "50   https://m.liepin.com/company/2115085/  \n",
       "51   https://m.liepin.com/company/2115085/  \n",
       "52  https://m.liepin.com/company/10156263/  \n",
       "53   https://m.liepin.com/company/9429345/  \n",
       "54   https://m.liepin.com/company/9947855/  \n",
       "55   https://m.liepin.com/company/1547151/  \n",
       "56   https://m.liepin.com/company/9292058/  \n",
       "57   https://m.liepin.com/company/9287730/  \n",
       "58   https://m.liepin.com/company/9261841/  \n",
       "59  https://m.liepin.com/company/10156263/  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-1   单一页面\n",
    "url = \"https://m.liepin.com/zhaopin/?keyword=PRD\"\n",
    "session = HTMLSession()\n",
    "r = session.get( url )\n",
    "\n",
    "# C-5\n",
    "# 难: '公司URL', '时间', '经验'\n",
    "\n",
    "# 先取特定元素, 精准打击其子后辈\n",
    "主要元素 = r.html.xpath( \\\n",
    "    '//div[@class=\"job-card-wrap\"]//div[@class=\"job-card\"]')\n",
    "\n",
    "# 作为xpath字典，键为我要抓的牛肉名称，值为xpath\n",
    "dict_xpaths={ \n",
    "    'text': {\n",
    "        '经验':      './/ul/li[time]/text()'\n",
    "    },\n",
    "    'text_content': {\n",
    "        '职称':    './/ul/li/a[contains(@class,\"job-name\")]/span[@class=\"name-text\"]', \n",
    "        '薪水':    './/ul/li/a[contains(@class,\"job-name\")]/following-sibling::span', \n",
    "        '公司地点':'.//ul/li/time/following-sibling::a',\n",
    "        '公司名称': './/ul/li/a[contains(@class,\"company-name\")]', \n",
    "        '时间':    './/ul/li/time', \n",
    "    },\n",
    "    'href': {\n",
    "        '链结':    './/ul/li/a[contains(@class,\"job-name\")]', \n",
    "        '公司URL': './/ul/li/a[contains(@class,\"company-name\")]', \n",
    "    }\n",
    "}\n",
    "\n",
    "def get_e_text_content(_xpath_):\n",
    "    # 高级列表推导\n",
    "    暂存结果 = [e.xpath(_xpath_)[0].lxml.text_content() for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "def get_e_text(_xpath_):\n",
    "    # 高级列表推导\n",
    "    暂存结果 = [\"\".join([x.strip() for x in e.xpath(_xpath_)]) for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "def get_e_href(_xpath_):\n",
    "    # 高级列表推导\n",
    "    暂存结果 = [list(e.xpath(_xpath_, first=True).absolute_links)[0] \\\n",
    "               if len(e.xpath(_xpath_, first=True).absolute_links) >= 1  \\\n",
    "               else \"\" for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "# 只对主要元素下进行.xpath取值\n",
    "数据字典 = dict()\n",
    "\n",
    "数据字典 = {k:get_e_text_content(v) for k,v in dict_xpaths['text_content'].items()}\n",
    "数据字典.update({k:get_e_text(v) for k,v in dict_xpaths['text'].items()})\n",
    "数据字典.update({k:get_e_href(v) for k,v in dict_xpaths['href'].items()})\n",
    "\n",
    "print ([len(v) for k,v in 数据字典.items()])  # 檢查\n",
    "\n",
    "数据 = pd.DataFrame(数据字典)\n",
    "数据.to_excel(\"20春_Web数据挖掘_week02_liepin.xlsx\", sheet_name=\"搜查结果\")\n",
    "数据 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "-----\n",
    "\n",
    "# 本周目标\n",
    "* [猎聘PC版](https://www.liepin.com/zhaopin/)\n",
    "* 上方导航有  公司行业 城市 薪资 的分页选单\n",
    "* 请练习xpath抽出数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Xpath解析HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>edu</th>\n",
       "      <th>经验</th>\n",
       "      <th>薪水</th>\n",
       "      <th>时间</th>\n",
       "      <th>职称</th>\n",
       "      <th>公司地点</th>\n",
       "      <th>公司名称</th>\n",
       "      <th>链结</th>\n",
       "      <th>公司URL</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>财务管理部长</td>\n",
       "      <td>重庆-永川区</td>\n",
       "      <td>山东宏鑫源钢板有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1927090305.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8030311/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>10年以上</td>\n",
       "      <td>12-20k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>集团工程经理</td>\n",
       "      <td>南宁-青秀区</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1927027679.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>嵌入式硬件工程师</td>\n",
       "      <td>南京-江宁区</td>\n",
       "      <td>南高齿集团</td>\n",
       "      <td>https://www.liepin.com/job/1927014087.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8037821/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>寻源采购工程师-国际采购</td>\n",
       "      <td>南京-江宁区</td>\n",
       "      <td>南高齿集团</td>\n",
       "      <td>https://www.liepin.com/job/1927006131.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8037821/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>8年以上</td>\n",
       "      <td>8-10k·14薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>财务分析主管</td>\n",
       "      <td>镇江-丹阳</td>\n",
       "      <td>大亚科技集团</td>\n",
       "      <td>https://www.liepin.com/job/1926767325.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7916453/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>8年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>应用架构师</td>\n",
       "      <td>南宁-青秀区</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926721779.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>12-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>国企行政副总</td>\n",
       "      <td>威海-环翠区</td>\n",
       "      <td>烟台市方程式软件有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926176983.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9987923/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>13-17k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>政府园区招商副总</td>\n",
       "      <td>威海-环翠区</td>\n",
       "      <td>烟台市方程式软件有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926176943.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9987923/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>7年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>(Sr.) Sales Account Manager</td>\n",
       "      <td>上海-浦东新区</td>\n",
       "      <td>MATLAB</td>\n",
       "      <td>https://www.liepin.com/job/1925634897.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8185522/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>绩效经理</td>\n",
       "      <td>东莞</td>\n",
       "      <td>正业科技广东</td>\n",
       "      <td>https://www.liepin.com/job/1925579885.shtml</td>\n",
       "      <td>https://www.liepin.com/company/470396/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>8年以上</td>\n",
       "      <td>30-40k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>全国销售总监（第三终端）</td>\n",
       "      <td>广州</td>\n",
       "      <td>海南泰迪医药有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1925577335.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9328149/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>6-10k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>幕墙设计师</td>\n",
       "      <td>深圳</td>\n",
       "      <td>深圳市华剑建设集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1925391261.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7882427/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>硕士及以上</td>\n",
       "      <td>2年以上</td>\n",
       "      <td>20-31k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>法务经理</td>\n",
       "      <td>上海</td>\n",
       "      <td>张江科技</td>\n",
       "      <td>https://www.liepin.com/job/1924963035.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7964307/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>8-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>景观工程师</td>\n",
       "      <td>上海</td>\n",
       "      <td>大华</td>\n",
       "      <td>https://www.liepin.com/job/1924621939.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1380401/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>景观设计师</td>\n",
       "      <td>上海</td>\n",
       "      <td>大华</td>\n",
       "      <td>https://www.liepin.com/job/1924620471.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1380401/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>8年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>Production Manager</td>\n",
       "      <td>杭州</td>\n",
       "      <td>采埃孚传动技术(杭州)有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1924467895.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8042465/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>项目主管</td>\n",
       "      <td>上海</td>\n",
       "      <td>大华</td>\n",
       "      <td>https://www.liepin.com/job/1924398333.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1380401/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>5-8k·15薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>财务分析专员</td>\n",
       "      <td>杭州-萧山区</td>\n",
       "      <td>杭州依维柯汽车传动技术有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1924146065.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9796943/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>1年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>质量工程师-过程质量</td>\n",
       "      <td>南京-江宁区</td>\n",
       "      <td>南高齿集团</td>\n",
       "      <td>https://www.liepin.com/job/1921592835.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8037821/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>10-25k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>高级软件工程师</td>\n",
       "      <td>东莞-樟木头</td>\n",
       "      <td>罗曼智能</td>\n",
       "      <td>https://www.liepin.com/job/1919324131.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8732442/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>硕士及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>模拟IC设计工程师</td>\n",
       "      <td>深圳</td>\n",
       "      <td>深圳市芯天下技术有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1918820879.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8239498/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>20-40k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>营销高级经理</td>\n",
       "      <td>南宁</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1918258855.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>5-10k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>期货营销</td>\n",
       "      <td>郑州</td>\n",
       "      <td>中原期货</td>\n",
       "      <td>https://www.liepin.com/job/1917852823.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8699497/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>1年以上</td>\n",
       "      <td>5-8k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>税务会计</td>\n",
       "      <td>温州</td>\n",
       "      <td>浙江田野餐饮管理连锁有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1916431327.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9103408/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>1年以上</td>\n",
       "      <td>6-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>BIM工程师</td>\n",
       "      <td>深圳</td>\n",
       "      <td>深圳市华剑建设集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1915522635.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7882427/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>8-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>广西竹林公司财务经理</td>\n",
       "      <td>南宁</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1910982541.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>10年以上</td>\n",
       "      <td>12-25k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>广西竹林公司营林副总经理</td>\n",
       "      <td>南宁</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1910337843.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>8年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>设备副总经理</td>\n",
       "      <td>赣州</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/196635577.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>16-25k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>集团人资经理</td>\n",
       "      <td>南宁-民族大道</td>\n",
       "      <td>华劲集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/194755693.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7938232/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>13-15k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>案场经理</td>\n",
       "      <td>重庆-九龙坡区</td>\n",
       "      <td>眉山隆和旅游开发有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1927029615.shtml</td>\n",
       "      <td>https://www.liepin.com/company/10149725/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>8-12k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>混凝土实验室主任</td>\n",
       "      <td>苏州</td>\n",
       "      <td>江苏中意包装有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926976593.shtml</td>\n",
       "      <td>https://www.liepin.com/company/12157481/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>电商总监</td>\n",
       "      <td>广州-大石</td>\n",
       "      <td>海南泰迪医药有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926915453.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9328149/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>硕士及以上</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>20-30k·13薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>高级电机设计</td>\n",
       "      <td>深圳-南山区</td>\n",
       "      <td>国奥科技(深圳)有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926839859.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9923085/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>5年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>项目营销负责人</td>\n",
       "      <td>北京</td>\n",
       "      <td>华润置地华北大区</td>\n",
       "      <td>https://www.liepin.com/job/1926333679.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7916548/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>8年以上</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>OTC大区总监</td>\n",
       "      <td>郑州</td>\n",
       "      <td>海南泰迪医药有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926022811.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9328149/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>1年以上</td>\n",
       "      <td>3-8k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>置业顾问</td>\n",
       "      <td>成都-高新区</td>\n",
       "      <td>眉山隆和旅游开发有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1925878877.shtml</td>\n",
       "      <td>https://www.liepin.com/company/10149725/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>10-15k·13薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>c#软件工程师</td>\n",
       "      <td>上海</td>\n",
       "      <td>旺玖智能科技(上海)有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1925397603.shtml</td>\n",
       "      <td>https://www.liepin.com/company/10286737/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>5-8k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>期货合规法务岗</td>\n",
       "      <td>郑州-郑东新区</td>\n",
       "      <td>中原期货</td>\n",
       "      <td>https://www.liepin.com/job/1924999473.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8699497/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>硕士及以上</td>\n",
       "      <td>经验不限</td>\n",
       "      <td>6-12k·12薪</td>\n",
       "      <td>2020年03月31日</td>\n",
       "      <td>信息技术岗</td>\n",
       "      <td>郑州-郑东新区</td>\n",
       "      <td>中原期货</td>\n",
       "      <td>https://www.liepin.com/job/1918198625.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8699497/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>2年以上</td>\n",
       "      <td>8-10k·12薪</td>\n",
       "      <td>2020年03月30日</td>\n",
       "      <td>学术代表-太原</td>\n",
       "      <td></td>\n",
       "      <td>博康健基因</td>\n",
       "      <td>https://www.liepin.com/job/1927087201.shtml</td>\n",
       "      <td>https://www.liepin.com/company/7855495/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      edu     经验          薪水           时间                           职称  \\\n",
       "0    统招本科   5年以上  15-25k·12薪  2020年03月31日                       财务管理部长   \n",
       "1   本科及以上  10年以上  12-20k·12薪  2020年03月31日                       集团工程经理   \n",
       "2   本科及以上   3年以上          面议  2020年03月31日                     嵌入式硬件工程师   \n",
       "3   本科及以上   3年以上          面议  2020年03月31日                 寻源采购工程师-国际采购   \n",
       "4    统招本科   8年以上   8-10k·14薪  2020年03月31日                       财务分析主管   \n",
       "5   本科及以上   8年以上          面议  2020年03月31日                        应用架构师   \n",
       "6   本科及以上   5年以上  12-15k·12薪  2020年03月31日                       国企行政副总   \n",
       "7   本科及以上   5年以上  13-17k·12薪  2020年03月31日                     政府园区招商副总   \n",
       "8   本科及以上   7年以上          面议  2020年03月31日  (Sr.) Sales Account Manager   \n",
       "9   本科及以上   5年以上  10-15k·12薪  2020年03月31日                         绩效经理   \n",
       "10  大专及以上   8年以上  30-40k·12薪  2020年03月31日                 全国销售总监（第三终端）   \n",
       "11  大专及以上   3年以上   6-10k·12薪  2020年03月31日                        幕墙设计师   \n",
       "12  硕士及以上   2年以上  20-31k·12薪  2020年03月31日                         法务经理   \n",
       "13  大专及以上   5年以上   8-15k·12薪  2020年03月31日                        景观工程师   \n",
       "14  本科及以上   5年以上  10-15k·12薪  2020年03月31日                        景观设计师   \n",
       "15  本科及以上   8年以上          面议  2020年03月31日           Production Manager   \n",
       "16  大专及以上   5年以上  10-15k·12薪  2020年03月31日                         项目主管   \n",
       "17   统招本科   3年以上    5-8k·15薪  2020年03月31日                       财务分析专员   \n",
       "18  本科及以上   1年以上          面议  2020年03月31日                   质量工程师-过程质量   \n",
       "19  大专及以上   5年以上  10-25k·12薪  2020年03月31日                      高级软件工程师   \n",
       "20  硕士及以上   5年以上          面议  2020年03月31日                    模拟IC设计工程师   \n",
       "21   统招本科   3年以上  20-40k·12薪  2020年03月31日                       营销高级经理   \n",
       "22  大专及以上   经验不限   5-10k·12薪  2020年03月31日                         期货营销   \n",
       "23  大专及以上   1年以上    5-8k·12薪  2020年03月31日                         税务会计   \n",
       "24  大专及以上   1年以上   6-15k·12薪  2020年03月31日                       BIM工程师   \n",
       "25   统招本科   5年以上   8-15k·12薪  2020年03月31日                   广西竹林公司财务经理   \n",
       "26   统招本科  10年以上  12-25k·12薪  2020年03月31日                 广西竹林公司营林副总经理   \n",
       "27   统招本科   8年以上          面议  2020年03月31日                       设备副总经理   \n",
       "28   统招本科   5年以上  16-25k·12薪  2020年03月31日                       集团人资经理   \n",
       "29   统招本科   3年以上  13-15k·12薪  2020年03月31日                         案场经理   \n",
       "30  大专及以上   3年以上   8-12k·12薪  2020年03月31日                     混凝土实验室主任   \n",
       "31  大专及以上   5年以上  20-30k·12薪  2020年03月31日                         电商总监   \n",
       "32  硕士及以上   5年以上  20-30k·13薪  2020年03月31日                       高级电机设计   \n",
       "33   统招本科   5年以上          面议  2020年03月31日                      项目营销负责人   \n",
       "34  大专及以上   8年以上  15-25k·12薪  2020年03月31日                      OTC大区总监   \n",
       "35  大专及以上   1年以上    3-8k·12薪  2020年03月31日                         置业顾问   \n",
       "36  本科及以上   3年以上  10-15k·13薪  2020年03月31日                      c#软件工程师   \n",
       "37  本科及以上   经验不限    5-8k·12薪  2020年03月31日                      期货合规法务岗   \n",
       "38  硕士及以上   经验不限   6-12k·12薪  2020年03月31日                        信息技术岗   \n",
       "39  大专及以上   2年以上   8-10k·12薪  2020年03月30日                      学术代表-太原   \n",
       "\n",
       "       公司地点             公司名称                                           链结  \\\n",
       "0    重庆-永川区      山东宏鑫源钢板有限公司  https://www.liepin.com/job/1927090305.shtml   \n",
       "1    南宁-青秀区       华劲集团股份有限公司  https://www.liepin.com/job/1927027679.shtml   \n",
       "2    南京-江宁区            南高齿集团  https://www.liepin.com/job/1927014087.shtml   \n",
       "3    南京-江宁区            南高齿集团  https://www.liepin.com/job/1927006131.shtml   \n",
       "4     镇江-丹阳           大亚科技集团  https://www.liepin.com/job/1926767325.shtml   \n",
       "5    南宁-青秀区       华劲集团股份有限公司  https://www.liepin.com/job/1926721779.shtml   \n",
       "6    威海-环翠区     烟台市方程式软件有限公司  https://www.liepin.com/job/1926176983.shtml   \n",
       "7    威海-环翠区     烟台市方程式软件有限公司  https://www.liepin.com/job/1926176943.shtml   \n",
       "8   上海-浦东新区           MATLAB  https://www.liepin.com/job/1925634897.shtml   \n",
       "9        东莞           正业科技广东  https://www.liepin.com/job/1925579885.shtml   \n",
       "10       广州       海南泰迪医药有限公司  https://www.liepin.com/job/1925577335.shtml   \n",
       "11       深圳  深圳市华剑建设集团股份有限公司  https://www.liepin.com/job/1925391261.shtml   \n",
       "12       上海             张江科技  https://www.liepin.com/job/1924963035.shtml   \n",
       "13       上海               大华  https://www.liepin.com/job/1924621939.shtml   \n",
       "14       上海               大华  https://www.liepin.com/job/1924620471.shtml   \n",
       "15       杭州  采埃孚传动技术(杭州)有限公司  https://www.liepin.com/job/1924467895.shtml   \n",
       "16       上海               大华  https://www.liepin.com/job/1924398333.shtml   \n",
       "17   杭州-萧山区  杭州依维柯汽车传动技术有限公司  https://www.liepin.com/job/1924146065.shtml   \n",
       "18   南京-江宁区            南高齿集团  https://www.liepin.com/job/1921592835.shtml   \n",
       "19   东莞-樟木头             罗曼智能  https://www.liepin.com/job/1919324131.shtml   \n",
       "20       深圳     深圳市芯天下技术有限公司  https://www.liepin.com/job/1918820879.shtml   \n",
       "21       南宁       华劲集团股份有限公司  https://www.liepin.com/job/1918258855.shtml   \n",
       "22       郑州             中原期货  https://www.liepin.com/job/1917852823.shtml   \n",
       "23       温州   浙江田野餐饮管理连锁有限公司  https://www.liepin.com/job/1916431327.shtml   \n",
       "24       深圳  深圳市华剑建设集团股份有限公司  https://www.liepin.com/job/1915522635.shtml   \n",
       "25       南宁       华劲集团股份有限公司  https://www.liepin.com/job/1910982541.shtml   \n",
       "26       南宁       华劲集团股份有限公司  https://www.liepin.com/job/1910337843.shtml   \n",
       "27       赣州       华劲集团股份有限公司   https://www.liepin.com/job/196635577.shtml   \n",
       "28  南宁-民族大道       华劲集团股份有限公司   https://www.liepin.com/job/194755693.shtml   \n",
       "29  重庆-九龙坡区     眉山隆和旅游开发有限公司  https://www.liepin.com/job/1927029615.shtml   \n",
       "30       苏州       江苏中意包装有限公司  https://www.liepin.com/job/1926976593.shtml   \n",
       "31    广州-大石       海南泰迪医药有限公司  https://www.liepin.com/job/1926915453.shtml   \n",
       "32   深圳-南山区     国奥科技(深圳)有限公司  https://www.liepin.com/job/1926839859.shtml   \n",
       "33       北京         华润置地华北大区  https://www.liepin.com/job/1926333679.shtml   \n",
       "34       郑州       海南泰迪医药有限公司  https://www.liepin.com/job/1926022811.shtml   \n",
       "35   成都-高新区     眉山隆和旅游开发有限公司  https://www.liepin.com/job/1925878877.shtml   \n",
       "36       上海   旺玖智能科技(上海)有限公司  https://www.liepin.com/job/1925397603.shtml   \n",
       "37  郑州-郑东新区             中原期货  https://www.liepin.com/job/1924999473.shtml   \n",
       "38  郑州-郑东新区             中原期货  https://www.liepin.com/job/1918198625.shtml   \n",
       "39                     博康健基因  https://www.liepin.com/job/1927087201.shtml   \n",
       "\n",
       "                                       公司URL  \n",
       "0    https://www.liepin.com/company/8030311/  \n",
       "1    https://www.liepin.com/company/7938232/  \n",
       "2    https://www.liepin.com/company/8037821/  \n",
       "3    https://www.liepin.com/company/8037821/  \n",
       "4    https://www.liepin.com/company/7916453/  \n",
       "5    https://www.liepin.com/company/7938232/  \n",
       "6    https://www.liepin.com/company/9987923/  \n",
       "7    https://www.liepin.com/company/9987923/  \n",
       "8    https://www.liepin.com/company/8185522/  \n",
       "9     https://www.liepin.com/company/470396/  \n",
       "10   https://www.liepin.com/company/9328149/  \n",
       "11   https://www.liepin.com/company/7882427/  \n",
       "12   https://www.liepin.com/company/7964307/  \n",
       "13   https://www.liepin.com/company/1380401/  \n",
       "14   https://www.liepin.com/company/1380401/  \n",
       "15   https://www.liepin.com/company/8042465/  \n",
       "16   https://www.liepin.com/company/1380401/  \n",
       "17   https://www.liepin.com/company/9796943/  \n",
       "18   https://www.liepin.com/company/8037821/  \n",
       "19   https://www.liepin.com/company/8732442/  \n",
       "20   https://www.liepin.com/company/8239498/  \n",
       "21   https://www.liepin.com/company/7938232/  \n",
       "22   https://www.liepin.com/company/8699497/  \n",
       "23   https://www.liepin.com/company/9103408/  \n",
       "24   https://www.liepin.com/company/7882427/  \n",
       "25   https://www.liepin.com/company/7938232/  \n",
       "26   https://www.liepin.com/company/7938232/  \n",
       "27   https://www.liepin.com/company/7938232/  \n",
       "28   https://www.liepin.com/company/7938232/  \n",
       "29  https://www.liepin.com/company/10149725/  \n",
       "30  https://www.liepin.com/company/12157481/  \n",
       "31   https://www.liepin.com/company/9328149/  \n",
       "32   https://www.liepin.com/company/9923085/  \n",
       "33   https://www.liepin.com/company/7916548/  \n",
       "34   https://www.liepin.com/company/9328149/  \n",
       "35  https://www.liepin.com/company/10149725/  \n",
       "36  https://www.liepin.com/company/10286737/  \n",
       "37   https://www.liepin.com/company/8699497/  \n",
       "38   https://www.liepin.com/company/8699497/  \n",
       "39   https://www.liepin.com/company/7855495/  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# A-1   单一页面\n",
    "url = \"https://www.liepin.com/zhaopin/?keyword=PRD\"\n",
    "session = HTMLSession()\n",
    "r = session.get( url )\n",
    "\n",
    "# 先取特定元素, 精准打击其子后辈\n",
    "主要元素 = r.html.xpath( \\\n",
    "    '//ul[@class=\"sojob-list\"]/li')\n",
    "\n",
    "# 预期是一个元素的列表？\n",
    "#print (主要元素[0].xpath('//div[contains(@class,\"sojob-item-main\")]'))\n",
    "#print (主要元素[0].xpath('//div[contains(@class,\"job-info\")]/h3/a'))\n",
    "#print (主要元素[3].xpath('//div[contains(@class,\"job-info\")]/p/a'))\n",
    "#print (主要元素[3].xpath('//div[contains(@class,\"job-info\")]/p/span[@class=\"text-warning\"]'))\n",
    "#print (主要元素[3].xpath('//div[contains(@class,\"job-info\")]/p/span[@class=\"edu\"]/following-sibling::span'))\n",
    "#print (主要元素[3].xpath('//div[contains(@class,\"job-info\")]/p/time/@title'))\n",
    "#print (主要元素[0].xpath('//div[contains(@class,\"sojob-item-main\")]//p[@class=\"company-name\"]/a'))\n",
    "\n",
    "# 作为xpath字典，键为我要抓的牛肉名称，值为xpath\n",
    "dict_xpaths={ \n",
    "    'text': {\n",
    "        'edu':      '//div[contains(@class,\"job-info\")]/p/span[@class=\"edu\"]',\n",
    "        '经验':      '//div[contains(@class,\"job-info\")]/p/span[@class=\"edu\"]/following-sibling::span',\n",
    "        '薪水':    '//div[contains(@class,\"job-info\")]/p/span[@class=\"text-warning\"]', \n",
    "        '时间':    '//div[contains(@class,\"job-info\")]/p/time/@title', \n",
    "        '职称':    '//div[contains(@class,\"job-info\")]/h3/a', \n",
    "        '公司地点': '//div[contains(@class,\"job-info\")]/p/a',\n",
    "        '公司名称': '//div[contains(@class,\"sojob-item-main\")]//p[@class=\"company-name\"]/a', \n",
    "    },\n",
    "    'text_content': {\n",
    "    },\n",
    "    'href': {\n",
    "        '链结':    '//div[contains(@class,\"job-info\")]/h3/a', \n",
    "        '公司URL': '//div[contains(@class,\"sojob-item-main\")]//p[@class=\"company-name\"]/a', \n",
    "    }\n",
    "}\n",
    "\n",
    "def get_e_text_content(_xpath_):\n",
    "    # 高级列表推导\n",
    "    暂存结果 = [e.xpath(_xpath_)[0].lxml.text_content() for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "def get_e_text(_xpath_):\n",
    "    # 高级列表推导\n",
    "    暂存结果 = [\"\".join([x.strip() if type(x) is str else x.text.strip() for x in e.xpath(_xpath_)]) for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "def get_e_href(_xpath_):\n",
    "    # 高级列表推导\n",
    "    暂存结果 = [list(e.xpath(_xpath_, first=True).absolute_links)[0] \\\n",
    "               if len(e.xpath(_xpath_, first=True).absolute_links) >= 1  \\\n",
    "               else \"\" for e in 主要元素]\n",
    "    return(暂存结果)\n",
    "\n",
    "# 只对主要元素下进行.xpath取值\n",
    "数据字典 = dict()\n",
    "\n",
    "数据字典 = {k:get_e_text_content(v) for k,v in dict_xpaths['text_content'].items()}\n",
    "数据字典.update({k:get_e_text(v) for k,v in dict_xpaths['text'].items()})\n",
    "数据字典.update({k:get_e_href(v) for k,v in dict_xpaths['href'].items()})\n",
    "\n",
    "[len(v) for k,v in 数据字典.items()]\n",
    "\n",
    "数据 = pd.DataFrame(数据字典)\n",
    "数据.to_excel(\"20春_Web数据挖掘_week03_liepin.xlsx\", sheet_name=\"搜查结果\")\n",
    "数据 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用urllib3 解析 url \n",
    "上面的url应该触动不同的页面查询，但能不能轻松无误的拆分url并进行比较？\n",
    "\n",
    "### urllib模块功能介绍\n",
    "* urlparse \n",
    "返回的6个部分，分别是：scheme(机制)丶netloc(网络位置)丶path(路径)丶params(路径段参数)丶query(查询)丶fragment(片段)。\n",
    "* parse_qs\n",
    "返回query(查询)多个部分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[ParseResult(scheme='', netloc='', path='/zhaopin/', params='', query='init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=155&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c', fragment=''),\n",
       " ParseResult(scheme='', netloc='', path='/zhaopin/', params='', query='init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=182&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c', fragment=''),\n",
       " ParseResult(scheme='', netloc='', path='/zhaopin/', params='', query='init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=186&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c', fragment=''),\n",
       " ParseResult(scheme='', netloc='', path='/zhaopin/', params='', query='init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=189&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c', fragment=''),\n",
       " ParseResult(scheme='', netloc='', path='/zhaopin/', params='', query='init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=130&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c', fragment=''),\n",
       " ParseResult(scheme='', netloc='', path='/zhaopin/', params='', query='init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=156&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c', fragment='')]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-1 使用 urllib.parse 解析\n",
    "from urllib.parse import urlparse, parse_qs\n",
    "[ urlparse(x) for x in 公司数据选择器链结.values()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<Element 'div' class=('search-conditions',) data-selector='search-conditions'>]\n",
      "<Element 'div' class=('search-conditions',) data-selector='search-conditions'>\n",
      "[<Element 'dt' class=('search-title',)>, <Element 'dt' class=('search-title',)>, <Element 'dt' class=('search-title',)>, <Element 'dt' class=('search-title',)>, <Element 'dt' class=('search-title',)>]\n",
      "公司：\n",
      "行业：\n",
      "城市：\n",
      "薪资：\n",
      "更多：\n",
      "<Element 'dd' class=('comp-list',)>\n",
      "<Element 'dd' class=('short-dd', 'select-industry') data-param='industries'>\n",
      "<Element 'dd' data-param='city'>\n",
      "<Element 'dd' data-param='salary'>\n",
      "<Element 'dd' class=('dropdown', 'dropdown-time')>\n",
      "<Element 'dd' class=('dropdown', 'dropdown-jobkind')>\n",
      "<Element 'dd' class=('dropdown', 'dropdown-compscale')>\n",
      "<Element 'dd' class=('dropdown', 'dropdown-compkind')>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'中国500强': '/zhaopin/?init=-1&headckid=609e717fd0beeed7&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=155&ckid=609e717fd0beeed7&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=07e09d3bd0b0a5edb9a91ba3508a8f3e&d_curPage=0&d_pageSize=40&d_headId=07e09d3bd0b0a5edb9a91ba3508a8f3e',\n",
       " '2018互联网300强': '/zhaopin/?init=-1&headckid=609e717fd0beeed7&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=182&ckid=609e717fd0beeed7&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=07e09d3bd0b0a5edb9a91ba3508a8f3e&d_curPage=0&d_pageSize=40&d_headId=07e09d3bd0b0a5edb9a91ba3508a8f3e',\n",
       " '制造业500强': '/zhaopin/?init=-1&headckid=609e717fd0beeed7&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=186&ckid=609e717fd0beeed7&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=07e09d3bd0b0a5edb9a91ba3508a8f3e&d_curPage=0&d_pageSize=40&d_headId=07e09d3bd0b0a5edb9a91ba3508a8f3e',\n",
       " 'AI创新成长50强 ': '/zhaopin/?init=-1&headckid=609e717fd0beeed7&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=189&ckid=609e717fd0beeed7&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=07e09d3bd0b0a5edb9a91ba3508a8f3e&d_curPage=0&d_pageSize=40&d_headId=07e09d3bd0b0a5edb9a91ba3508a8f3e',\n",
       " '独角兽': '/zhaopin/?init=-1&headckid=609e717fd0beeed7&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=130&ckid=609e717fd0beeed7&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=07e09d3bd0b0a5edb9a91ba3508a8f3e&d_curPage=0&d_pageSize=40&d_headId=07e09d3bd0b0a5edb9a91ba3508a8f3e',\n",
       " '上市公司': '/zhaopin/?init=-1&headckid=609e717fd0beeed7&flushckid=1&fromSearchBtn=2&keyword=PRD&compTag=156&ckid=609e717fd0beeed7&siTag=1B2M2Y8AsgTpgAmY7PhCfg%7EfA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=07e09d3bd0b0a5edb9a91ba3508a8f3e&d_curPage=0&d_pageSize=40&d_headId=07e09d3bd0b0a5edb9a91ba3508a8f3e'}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# A-2 扩张 公司 ?  \n",
    "\n",
    "# 先取特定元素, 精准打击其子后辈\n",
    "主要元素 = r.html.xpath('//div[@data-selector=\"search-conditions\"]')\n",
    "# 预期是一个元素的列表？\n",
    "print (主要元素)\n",
    "print (主要元素[0])\n",
    "print (主要元素[0].xpath('//dt[@class=\"search-title\"]'))\n",
    "\n",
    "list_search_title = 主要元素[0].xpath('//dt[@class=\"search-title\"]')\n",
    "for x in list_search_title:\n",
    "    print (x.text)\n",
    "    \n",
    "list_search_dd = 主要元素[0].xpath('//dt[@class=\"search-title\"]/following-sibling::dd')\n",
    "for x in list_search_dd:\n",
    "    print (x)  \n",
    "    \n",
    "\n",
    "公司数据选择器链结 = r.html.xpath('//div[@data-selector=\"search-conditions\"]')[0] \\\n",
    "                    .xpath('//dt[@class=\"search-title\"]/following-sibling::dd')[0] \\\n",
    "                    .xpath('//div[contains(@class,\"hot-comp-tags\")]/a/@href')\n",
    "               \n",
    "公司数据选择器链结\n",
    "\n",
    "# 但我们需要知道这些选择器链结, 对映到什麽数据\n",
    "公司数据选择器链结 = r.html.xpath('//div[@data-selector=\"search-conditions\"]')[0] \\\n",
    "                    .xpath('//dt[@class=\"search-title\"]/following-sibling::dd')[0] \\\n",
    "                    .xpath('//div[contains(@class,\"hot-comp-tags\")]/a')\n",
    "公司数据选择器链结\n",
    "\n",
    "#[ x.xpath(\"a/@href\")[0] for x in 公司数据选择器链结]\n",
    "#[ x.xpath(\"a/text()\")[0] for x in 公司数据选择器链结]\n",
    "公司数据选择器链结 = { x.xpath(\"a/text()\")[0]:x.xpath(\"a/@href\")[0] for x in 公司数据选择器链结}\n",
    "公司数据选择器链结"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 6 entries, 0 to 5\n",
      "Data columns (total 6 columns):\n",
      " #   Column    Non-Null Count  Dtype \n",
      "---  ------    --------------  ----- \n",
      " 0   scheme    6 non-null      object\n",
      " 1   netloc    6 non-null      object\n",
      " 2   path      6 non-null      object\n",
      " 3   params    6 non-null      object\n",
      " 4   query     6 non-null      object\n",
      " 5   fragment  6 non-null      object\n",
      "dtypes: object(6)\n",
      "memory usage: 416.0+ bytes\n",
      "scheme      1\n",
      "netloc      1\n",
      "path        1\n",
      "params      1\n",
      "query       6\n",
      "fragment    1\n",
      "dtype: int64\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>scheme</th>\n",
       "      <th>netloc</th>\n",
       "      <th>path</th>\n",
       "      <th>params</th>\n",
       "      <th>query</th>\n",
       "      <th>fragment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>/zhaopin/</td>\n",
       "      <td></td>\n",
       "      <td>init=-1&amp;headckid=609e717fd0beeed7&amp;flushckid=1&amp;...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  scheme netloc       path params  \\\n",
       "0                /zhaopin/          \n",
       "\n",
       "                                               query fragment  \n",
       "0  init=-1&headckid=609e717fd0beeed7&flushckid=1&...           "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-2 使用 pd.DataFrame进行 unuinque()相异值计量比对 \n",
    "import pandas as pd\n",
    "df = pd.DataFrame([ urlparse(x) for x in 公司数据选择器链结.values()])\n",
    "df.info()\n",
    "print(df.nunique())\n",
    "df.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "init             1\n",
      "headckid         1\n",
      "flushckid        1\n",
      "fromSearchBtn    1\n",
      "keyword          1\n",
      "compTag          6\n",
      "ckid             1\n",
      "siTag            1\n",
      "d_sfrom          1\n",
      "d_ckId           1\n",
      "d_curPage        1\n",
      "d_pageSize       1\n",
      "d_headId         1\n",
      "dtype: int64\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>keyword</th>\n",
       "      <th>compTag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>PRD</td>\n",
       "      <td>155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>PRD</td>\n",
       "      <td>182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>PRD</td>\n",
       "      <td>186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>PRD</td>\n",
       "      <td>189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PRD</td>\n",
       "      <td>130</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>PRD</td>\n",
       "      <td>156</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  keyword compTag\n",
       "0     PRD     155\n",
       "1     PRD     182\n",
       "2     PRD     186\n",
       "3     PRD     189\n",
       "4     PRD     130\n",
       "5     PRD     156"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-3 针对query 再解析之 \n",
    "#df_qs = pd.DataFrame([ parse_qs(x) for x in df['query'] ])\n",
    "df_qs = pd.DataFrame([{k:v[0] for k,v in parse_qs(x).items()} for x in df['query'] ])\n",
    "print (df_qs.nunique())\n",
    "df_qs.head()\n",
    "df_qs[['keyword','compTag']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 小结\n",
    "* comTag 是不同的公司选择器, 数值不样, 对映到不同类型的公司\n",
    "* keyword 是搜查关键字"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['PRD'], 'compTag': ['155'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}\n",
      "{'中国500强': '155', '2018互联网300强': '182', '制造业500强': '186', 'AI创新成长50强 ': '189', '独角兽': '130', '上市公司': '156'}\n"
     ]
    }
   ],
   "source": [
    "# B-4 建构 参数模板 及 字典_compTag\n",
    "def parse_url_qs_for_compTag (url):\n",
    "    six_parts = urlparse(url) \n",
    "    out = parse_qs(six_parts.query)\n",
    "    return (out)\n",
    "\n",
    "# parse_url_qs_for_compTag(list(公司数据选择器链结.values())[0])['compTag']\n",
    "参数模板 = parse_url_qs_for_compTag(list(公司数据选择器链结.values())[0])\n",
    "print(参数模板)\n",
    "# [ parse_url_qs_for_compTag(x)['compTag'] for x in 公司数据选择器链结.values()]\n",
    "[ parse_url_qs_for_compTag(x)['compTag'][0] for x in 公司数据选择器链结.values()]\n",
    "\n",
    "字典_compTag = { k:parse_url_qs_for_compTag(v)['compTag'][0] for k,v in 公司数据选择器链结.items()}\n",
    "print (字典_compTag)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'中国500强': {'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['用户体验'], 'compTag': ['155'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}, '2018互联网300强': {'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['用户体验'], 'compTag': ['182'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}, '制造业500强': {'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['用户体验'], 'compTag': ['186'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}, 'AI创新成长50强 ': {'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['用户体验'], 'compTag': ['189'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}, '独角兽': {'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['用户体验'], 'compTag': ['130'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}, '上市公司': {'init': ['-1'], 'headckid': ['1e6dae188c31324c'], 'flushckid': ['1'], 'fromSearchBtn': ['2'], 'keyword': ['用户体验'], 'compTag': ['156'], 'ckid': ['1e6dae188c31324c'], 'siTag': ['1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw'], 'd_sfrom': ['search_unknown'], 'd_ckId': ['7de6fbaf0e194e58f87331cdb1a9a10c'], 'd_curPage': ['0'], 'd_pageSize': ['40'], 'd_headId': ['7de6fbaf0e194e58f87331cdb1a9a10c']}}\n"
     ]
    }
   ],
   "source": [
    "# B-5 建构 参数模板  \n",
    "def 参数模板生成(compTag , keyword ):\n",
    "    参数 = 参数模板.copy()\n",
    "    参数['compTag'] = compTag\n",
    "    参数['keyword'] = keyword\n",
    "    return (参数)\n",
    "\n",
    "参数_compTag_用户体验 = { k:参数模板生成(compTag = [v], keyword = ['用户体验']) for k,v in 字典_compTag.items()}\n",
    "print(参数_compTag_用户体验)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## requests 生成"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://www.liepin.com/zhaopin/?init=-1&headckid=1e6dae188c31324c&flushckid=1&fromSearchBtn=2&keyword=%E7%94%A8%E6%88%B7%E4%BD%93%E9%AA%8C&compTag=155&ckid=1e6dae188c31324c&siTag=1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_ckId=7de6fbaf0e194e58f87331cdb1a9a10c&d_curPage=0&d_pageSize=40&d_headId=7de6fbaf0e194e58f87331cdb1a9a10c'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-1   多个页面准备测试1 中国500强\n",
    "url = \"https://www.liepin.com/zhaopin/\"\n",
    "session = HTMLSession()\n",
    "payload = 参数_compTag_用户体验['中国500强']\n",
    "r = session.get( url, params = payload)\n",
    "r.url"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# C-2  简化 A-1   单一页面爬+解析\n",
    "session = HTMLSession()\n",
    "\n",
    "def requests_liepin( url, params):\n",
    "    r = session.get( url , params = payload)\n",
    "\n",
    "    # 先取特定元素, 精准打击其子后辈\n",
    "    主要元素 = r.html.xpath( '//ul[@class=\"sojob-list\"]/li')\n",
    "\n",
    "    # 作为xpath字典，键为我要抓的牛肉名称，值为xpath\n",
    "    dict_xpaths={ \n",
    "        'text': {\n",
    "            'edu':      '//div[contains(@class,\"job-info\")]/p/span[@class=\"edu\"]',\n",
    "            '经验':      '//div[contains(@class,\"job-info\")]/p/span[@class=\"edu\"]/following-sibling::span',\n",
    "            '薪水':    '//div[contains(@class,\"job-info\")]/p/span[@class=\"text-warning\"]', \n",
    "            '时间':    '//div[contains(@class,\"job-info\")]/p/time/@title', \n",
    "            '职称':    '//div[contains(@class,\"job-info\")]/h3/a', \n",
    "            '公司地点': '//div[contains(@class,\"job-info\")]/p/a',\n",
    "            '公司名称': '//div[contains(@class,\"sojob-item-main\")]//p[@class=\"company-name\"]/a', \n",
    "        },\n",
    "        'text_content': {\n",
    "        },\n",
    "        'href': {\n",
    "            '链结':    '//div[contains(@class,\"job-info\")]/h3/a', \n",
    "            '公司URL': '//div[contains(@class,\"sojob-item-main\")]//p[@class=\"company-name\"]/a', \n",
    "        }\n",
    "    }\n",
    "\n",
    "    def get_e_text_content(_xpath_):\n",
    "        # 高级列表推导\n",
    "        暂存结果 = [e.xpath(_xpath_)[0].lxml.text_content() for e in 主要元素]\n",
    "        return(暂存结果)\n",
    "\n",
    "    def get_e_text(_xpath_):\n",
    "        # 高级列表推导\n",
    "        暂存结果 = [\"\".join([x.strip() if type(x) is str else x.text.strip() for x in e.xpath(_xpath_)]) for e in 主要元素]\n",
    "        return(暂存结果)\n",
    "\n",
    "    def get_e_href(_xpath_):\n",
    "        # 高级列表推导\n",
    "        暂存结果 = [list(e.xpath(_xpath_, first=True).absolute_links)[0] \\\n",
    "                   if len(e.xpath(_xpath_, first=True).absolute_links) >= 1  \\\n",
    "                   else \"\" for e in 主要元素]\n",
    "        return(暂存结果)\n",
    "\n",
    "    # 只对主要元素下进行.xpath取值\n",
    "    数据字典 = dict()\n",
    "\n",
    "    数据字典 = {k:get_e_text_content(v) for k,v in dict_xpaths['text_content'].items()}\n",
    "    数据字典.update({k:get_e_text(v) for k,v in dict_xpaths['text'].items()})\n",
    "    数据字典.update({k:get_e_href(v) for k,v in dict_xpaths['href'].items()})\n",
    "\n",
    "    数据 = pd.DataFrame(数据字典)\n",
    "    #数据.to_excel(\"20春_Web数据挖掘_week03_liepin.xlsx\", sheet_name=\"搜查结果\")\n",
    "    return (数据)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>edu</th>\n",
       "      <th>经验</th>\n",
       "      <th>薪水</th>\n",
       "      <th>时间</th>\n",
       "      <th>职称</th>\n",
       "      <th>公司地点</th>\n",
       "      <th>公司名称</th>\n",
       "      <th>链结</th>\n",
       "      <th>公司URL</th>\n",
       "      <th>热门公司类型</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月30日</td>\n",
       "      <td>大客户销售经理-北京-网易严选</td>\n",
       "      <td>北京-五道口</td>\n",
       "      <td>网易集团</td>\n",
       "      <td>https://www.liepin.com/job/1926756751.shtml</td>\n",
       "      <td>https://www.liepin.com/company/5964833/</td>\n",
       "      <td>中国500强</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月30日</td>\n",
       "      <td>阿里云智能事业群-数据技术专家(金融行业)-北京/杭州</td>\n",
       "      <td>杭州</td>\n",
       "      <td>阿里巴巴</td>\n",
       "      <td>https://www.liepin.com/job/1927063431.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1072424/</td>\n",
       "      <td>中国500强</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月27日</td>\n",
       "      <td>钉钉(Dingtalk)-搜索中心-Java开发技术专家</td>\n",
       "      <td>杭州</td>\n",
       "      <td>阿里巴巴</td>\n",
       "      <td>https://www.liepin.com/job/1926996383.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1072424/</td>\n",
       "      <td>中国500强</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>大专及以上</td>\n",
       "      <td>2年以上</td>\n",
       "      <td>6-8k·13薪</td>\n",
       "      <td>2020年03月25日</td>\n",
       "      <td>员工关系专员</td>\n",
       "      <td>廊坊-广阳区</td>\n",
       "      <td>中国国际技术智力合作有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926938099.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1233751/</td>\n",
       "      <td>中国500强</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>25-50k·12薪</td>\n",
       "      <td>2020年03月24日</td>\n",
       "      <td>钉钉(DingTalk)-安全运营专家-安全产品及中心</td>\n",
       "      <td>杭州</td>\n",
       "      <td>阿里巴巴</td>\n",
       "      <td>https://www.liepin.com/job/1926923363.shtml</td>\n",
       "      <td>https://www.liepin.com/company/1072424/</td>\n",
       "      <td>中国500强</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>6年以上</td>\n",
       "      <td>15-20k·13薪</td>\n",
       "      <td>2020年03月26日</td>\n",
       "      <td>法务经理/主任</td>\n",
       "      <td>深圳</td>\n",
       "      <td>中国南玻集团股份有限公司</td>\n",
       "      <td>https://www.liepin.com/job/1926955487.shtml</td>\n",
       "      <td>https://www.liepin.com/company/9091167/</td>\n",
       "      <td>上市公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>统招本科</td>\n",
       "      <td>10年以上</td>\n",
       "      <td>面议</td>\n",
       "      <td>2020年03月26日</td>\n",
       "      <td>CHO/HRD</td>\n",
       "      <td>上海</td>\n",
       "      <td>银科控股</td>\n",
       "      <td>https://www.liepin.com/job/1915800458.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8582797/</td>\n",
       "      <td>上市公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>3年以上</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "      <td>2020年03月25日</td>\n",
       "      <td>SAP 运维顾问</td>\n",
       "      <td>北京</td>\n",
       "      <td>科兴</td>\n",
       "      <td>https://www.liepin.com/job/1926949105.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8593199/</td>\n",
       "      <td>上市公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>1年以上</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "      <td>2020年03月25日</td>\n",
       "      <td>新闻短视频运营 (MJ000067)</td>\n",
       "      <td>北京</td>\n",
       "      <td>凤凰新媒体</td>\n",
       "      <td>https://www.liepin.com/job/1925965933.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8139695/</td>\n",
       "      <td>上市公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>本科及以上</td>\n",
       "      <td>1年以上</td>\n",
       "      <td>15-30k·12薪</td>\n",
       "      <td>2020年03月25日</td>\n",
       "      <td>视频原创记者 (MJ000088)</td>\n",
       "      <td>北京</td>\n",
       "      <td>凤凰新媒体</td>\n",
       "      <td>https://www.liepin.com/job/1925961891.shtml</td>\n",
       "      <td>https://www.liepin.com/company/8139695/</td>\n",
       "      <td>上市公司</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>240 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      edu     经验          薪水           时间                            职称  \\\n",
       "0   本科及以上   3年以上          面议  2020年03月30日               大客户销售经理-北京-网易严选   \n",
       "1    统招本科   3年以上          面议  2020年03月30日   阿里云智能事业群-数据技术专家(金融行业)-北京/杭州   \n",
       "2   本科及以上   3年以上          面议  2020年03月27日  钉钉(Dingtalk)-搜索中心-Java开发技术专家   \n",
       "3   大专及以上   2年以上    6-8k·13薪  2020年03月25日                        员工关系专员   \n",
       "4   本科及以上   3年以上  25-50k·12薪  2020年03月24日   钉钉(DingTalk)-安全运营专家-安全产品及中心   \n",
       "..    ...    ...         ...          ...                           ...   \n",
       "35  本科及以上   6年以上  15-20k·13薪  2020年03月26日                       法务经理/主任   \n",
       "36   统招本科  10年以上          面议  2020年03月26日                       CHO/HRD   \n",
       "37  本科及以上   3年以上  20-30k·12薪  2020年03月25日                      SAP 运维顾问   \n",
       "38  本科及以上   1年以上  10-20k·12薪  2020年03月25日            新闻短视频运营 (MJ000067)   \n",
       "39  本科及以上   1年以上  15-30k·12薪  2020年03月25日             视频原创记者 (MJ000088)   \n",
       "\n",
       "      公司地点            公司名称                                           链结  \\\n",
       "0   北京-五道口            网易集团  https://www.liepin.com/job/1926756751.shtml   \n",
       "1       杭州            阿里巴巴  https://www.liepin.com/job/1927063431.shtml   \n",
       "2       杭州            阿里巴巴  https://www.liepin.com/job/1926996383.shtml   \n",
       "3   廊坊-广阳区  中国国际技术智力合作有限公司  https://www.liepin.com/job/1926938099.shtml   \n",
       "4       杭州            阿里巴巴  https://www.liepin.com/job/1926923363.shtml   \n",
       "..     ...             ...                                          ...   \n",
       "35      深圳    中国南玻集团股份有限公司  https://www.liepin.com/job/1926955487.shtml   \n",
       "36      上海            银科控股  https://www.liepin.com/job/1915800458.shtml   \n",
       "37      北京              科兴  https://www.liepin.com/job/1926949105.shtml   \n",
       "38      北京           凤凰新媒体  https://www.liepin.com/job/1925965933.shtml   \n",
       "39      北京           凤凰新媒体  https://www.liepin.com/job/1925961891.shtml   \n",
       "\n",
       "                                      公司URL  热门公司类型  \n",
       "0   https://www.liepin.com/company/5964833/  中国500强  \n",
       "1   https://www.liepin.com/company/1072424/  中国500强  \n",
       "2   https://www.liepin.com/company/1072424/  中国500强  \n",
       "3   https://www.liepin.com/company/1233751/  中国500强  \n",
       "4   https://www.liepin.com/company/1072424/  中国500强  \n",
       "..                                      ...     ...  \n",
       "35  https://www.liepin.com/company/9091167/    上市公司  \n",
       "36  https://www.liepin.com/company/8582797/    上市公司  \n",
       "37  https://www.liepin.com/company/8593199/    上市公司  \n",
       "38  https://www.liepin.com/company/8139695/    上市公司  \n",
       "39  https://www.liepin.com/company/8139695/    上市公司  \n",
       "\n",
       "[240 rows x 10 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-3   多个页面\n",
    "url = \"https://www.liepin.com/zhaopin/\"\n",
    "\n",
    "list_df = list()\n",
    "for k,v in 参数_compTag_用户体验.items():\n",
    "    payload = v\n",
    "    df = requests_liepin( url, params = payload)\n",
    "    df = df.assign (热门公司类型 = k)    \n",
    "    list_df.append(df)\n",
    "\n",
    "df_all = pd.concat(list_df)\n",
    "df_all"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# C-4   输出\n",
    "df_all.to_excel(\"20春_Web数据挖掘_week03_liepin_各热门公司类型.xlsx\", sheet_name=\"搜查结果\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "edu         7\n",
      "经验         10\n",
      "薪水         77\n",
      "时间         31\n",
      "职称        186\n",
      "公司地点       72\n",
      "公司名称       56\n",
      "链结        197\n",
      "公司URL      57\n",
      "热门公司类型      6\n",
      "dtype: int64\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>职称</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>公司名称</th>\n",
       "      <th>edu</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">华为</th>\n",
       "      <th>统招本科</th>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>本科及以上</th>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>网易集团</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>科大讯飞</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>盛虹控股集团有限公司</th>\n",
       "      <th>统招本科</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>深圳市优必选科技股份有限公司</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>天津深之蓝海洋设备科技有限公司</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>深圳视见医疗科技有限公司</th>\n",
       "      <th>大专及以上</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>天津深之蓝海洋设备科技有限公司</th>\n",
       "      <th>博士</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>龙信集团</th>\n",
       "      <th>统招本科</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>83 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                       职称\n",
       "公司名称            edu      \n",
       "华为              统招本科   30\n",
       "                本科及以上  18\n",
       "网易集团            本科及以上  17\n",
       "科大讯飞            本科及以上  14\n",
       "盛虹控股集团有限公司      统招本科   10\n",
       "...                    ..\n",
       "深圳市优必选科技股份有限公司  本科及以上   1\n",
       "天津深之蓝海洋设备科技有限公司 本科及以上   1\n",
       "深圳视见医疗科技有限公司    大专及以上   1\n",
       "天津深之蓝海洋设备科技有限公司 博士      1\n",
       "龙信集团            统招本科    1\n",
       "\n",
       "[83 rows x 1 columns]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-5 Pandas  基本能力\n",
    "\n",
    "print (df_all.nunique())\n",
    "df_all[['edu']].drop_duplicates()\n",
    "\n",
    "df_all.groupby(['公司名称','edu']).agg({\"职称\":\"count\"}).sort_values(by='职称', ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>职称</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>公司名称</th>\n",
       "      <th>edu</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">华为</th>\n",
       "      <th>统招本科</th>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>本科及以上</th>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>网易集团</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>科大讯飞</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>盛虹控股集团有限公司</th>\n",
       "      <th>统招本科</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>深圳市优必选科技股份有限公司</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>天津深之蓝海洋设备科技有限公司</th>\n",
       "      <th>本科及以上</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>深圳视见医疗科技有限公司</th>\n",
       "      <th>大专及以上</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>天津深之蓝海洋设备科技有限公司</th>\n",
       "      <th>博士</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>龙信集团</th>\n",
       "      <th>统招本科</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>83 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                       职称\n",
       "公司名称            edu      \n",
       "华为              统招本科   30\n",
       "                本科及以上  18\n",
       "网易集团            本科及以上  17\n",
       "科大讯飞            本科及以上  14\n",
       "盛虹控股集团有限公司      统招本科   10\n",
       "...                    ..\n",
       "深圳市优必选科技股份有限公司  本科及以上   1\n",
       "天津深之蓝海洋设备科技有限公司 本科及以上   1\n",
       "深圳视见医疗科技有限公司    大专及以上   1\n",
       "天津深之蓝海洋设备科技有限公司 博士      1\n",
       "龙信集团            统招本科    1\n",
       "\n",
       "[83 rows x 1 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_all.groupby(['公司名称','edu']).agg({\"职称\":\"count\"}).sort_values(by='职称', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 本周练习\n",
    "\n",
    "一样反向工程解析:\n",
    "\n",
    "## 上方界面的params参数\n",
    "* 公司：v\n",
    "* 行业：?\n",
    "* 城市：?\n",
    "* 薪资：?\n",
    "## 下方界面的params参数\n",
    "* 跳转到 N 页确定 ?\n",
    "## 换  \n",
    "* keyword\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "749px",
    "left": "1125.609375px",
    "top": "110px",
    "width": "281.390625px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
